perf: avoid unnecessary base64 conversion for aiocqhttp image/record#6850
perf: avoid unnecessary base64 conversion for aiocqhttp image/record#6850he-yufeng wants to merge 2 commits intoAstrBotDevs:masterfrom
Conversation
…sending NapCat and other OneBot protocol endpoints natively support file://, http(s)://, and base64:// URIs. Previously _from_segment_to_dict always called convert_to_base64() which forces downloading HTTP images and reading local files into memory just to encode them — wasting CPU, memory, and bandwidth. Now we check the source URI first and pass it through directly when the protocol endpoint can handle it. Only truly unknown formats fall back to base64 encoding. Closes AstrBotDevs#6717
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! 此拉取请求旨在优化 Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The
_resolve_file_urihelper does a synchronousos.path.existscheck on every absolute path, which adds blocking filesystem I/O to the hot path; consider trusting absolute paths without existence checks and letting the protocol/backend handle errors instead. - The expression
raw = segment.url or segment.file if isinstance(segment, Image) else segment.fileis a bit hard to read and relies on operator precedence; splitting this into an explicitif isinstance(segment, Image)branch will make the intended behavior forurlvsfileunambiguous.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `_resolve_file_uri` helper does a synchronous `os.path.exists` check on every absolute path, which adds blocking filesystem I/O to the hot path; consider trusting absolute paths without existence checks and letting the protocol/backend handle errors instead.
- The expression `raw = segment.url or segment.file if isinstance(segment, Image) else segment.file` is a bit hard to read and relies on operator precedence; splitting this into an explicit `if isinstance(segment, Image)` branch will make the intended behavior for `url` vs `file` unambiguous.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| @@ -1,4 +1,5 @@ | |||
| import asyncio | |||
| import os | |||
|
|
||
| # 裸路径,转成 file:// URI 让协议端自己读 | ||
| if os.path.isabs(raw) and os.path.exists(raw): | ||
| import pathlib |
本地路径发送逻辑可能存在问题如果astrBot 与 napcat 不在同一个服务器,或者说它们的数据卷不是共享的, 我看到你的 使用 os 模块来进行文件读取可能存在并发问题os 模块的操作是同步操作的,多个线程会按照顺序排队等待读取文件。这可能反而“反向中和”了原本被改善的本地文件发送优化 建议 1你可以再去捕获下发送失败的异常,然后捕获到了再去走统一base64流程,但是我不建议捕获异常然后弄兜底, 因为这样效率不太好。 建议 2我建议你弄个配置项,在 "prefer_base64": {
"description": "优先使用 Base64 发送媒体",
"type": "bool",
"hint": "开启后,图片、语音媒体文件将使用 Base64 编码发送,确保跨服务器兼容。关闭后,优先使用本地文件路径或网络 URL 发送,失败时自动回退到 Base64。仅当 AstrBot 与协议端在同一台机器时可关闭。",
},建议 3建议上 |
|
确实,跨服务器的场景没考虑到 — 本地 exists 过了 NapCat 那边根本拿不到文件。 建议 2 最合理,加个 prefer_base64 配置项,默认开着保持兼容。已经改好推上去了,顺便把文件读取换成了 aiofiles。 |
- Add `prefer_base64` config option (default: true) for aiocqhttp adapter. When enabled, always use base64 encoding for media to ensure cross-server compatibility. When disabled, try file:// and http:// passthrough first. - Replace sync `os.path.exists` with `aiofiles.os.path.exists` to avoid blocking the event loop on concurrent requests. - Fix operator precedence ambiguity in _resolve_file_uri by splitting the Image/Record branching into explicit if/else. - Move `pathlib` import to module level (PEP 8). Addresses review feedback from @FlanChanXwO and @sourcery-ai.
问题
_from_segment_to_dict处理Image和Record时,无条件调用convert_to_base64()。对于http://来源的图片,这意味着先下载到本地再 base64 编码;对于file://本地文件,也要全部读进内存编码。大图片场景下内存占用和 CPU 开销都很明显。但实际上 NapCat 等 OneBot 协议端本身就支持
file://、http(s)://、base64://三种格式,完全没必要多做一次转换。改动
新增
_resolve_file_uri()方法,在发送前先检查 segment 的file字段格式:file:///path/to/imghttp(s)://urlbase64://datafile://URI 后透传这样只有无法识别的格式才会走 base64,常见场景都避免了不必要的内存拷贝和编码。
测试
file:///路径 → 原样透传,不触发文件读取http://URL → 原样透传,不触发下载base64://数据 → 原样透传file://URICloses #6717
Summary by Sourcery
Optimize handling of Image and Record segments in aiocqhttp events to avoid unnecessary base64 conversions when a directly usable file URI is available.
Enhancements: